| 1. | Risk - sensitive reinforcement learning algorithms with generalized average criterion 风险敏感度激励学习的广义平均算法 |
| 2. | A reinforcement learning algorithm based on process reward and prioritized sweeping is presented as interference solving strategy 本文提出了基于过程奖赏和优先扫除的强化学习算法作为多机器人系统的冲突消解策略。 |
| 3. | ( 4 ) a new cooperation model called macm is presentd and based on this model , an improved distributed reinforcement learning algorithm is also proposed ( 4 )提出一种新的多agent协作模型macm及一种改进的分布式强化学习算法。 |
| 4. | In the first chapter of this paper , a comprehensive survey on the research of reinforcement learning algorithms , theory and applications is provided . the recent developments and future directions for mobile robot navigation are also discussed 本文的第一章对增强学习理论、算法和应用研究的发展情况进行了全面深入的综述评论,同时分析了移动机器人导航控制的研究现状和发展趋势。 |
| 5. | Reinforcement learning has been applied to single agent environment successfully . due to the theoretical limitation that it assumes that an environment is markovian , traditional reinforcement learning algorithms cannot be applied directly to multi - agent system 由于强化学习理论的限制,在多智能体系统中马尔科夫过程模型不再适用,因此不能把强化学习直接用于多智能体的协作学习问题。 |
| 6. | In this paper , introducing joint - action to the traditional reinforcement learning , a new multi - agent reinforcement learning algorithm based on behavior prediction is presented and several methods for predicting other agents " behaviors are discussed 在传统强化学习方式中引入组合动作的基础上,本文提出了一种基于行为预测的多智能体强化学习方法,研究了对其他智能体行为进行预测的几种可行方法。 |
| 7. | The reinforcement learning algorithm was also introduced , since it has some relations with the colony algorithm and can be need in the problem of scheduling . 4 . some new concepts and scheduling algorithms for batch chemical process were proposed in our studies 由于蚁群算法与人工智能中的强化学习算法之间有着某种联系,同时强化学习近年来也应用于求解调度问题,因此本文也涉及到了一些强化学习的主要算法。 |
| 8. | Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved 在求解离散行为空间markov决策过程( mdp )最优策略的增强学习算法研究方面,研究了小脑模型关节控制器( cmac )在mdp行为值函数逼近中的应用,分析了基于cmac的直接梯度算法对mdp状态空间离散化的特点,研究了两种改进的cmac编码结构,即:非邻接重叠编码和变尺度编码,以提高直接梯度学习算法的收敛速度和泛化性能。 |
| 9. | By means of the proposed reinforcement learning algorithm and modified genetic algorithm , neural network controller whose weights are optimized could generate time series small perturbation signals to convert chaotic oscillations of chaotic systems into desired regular ones . the computer simulations on controlling henon map and logistic chaotic system have demonstrated the capacity of the presented strategy by suppressing lower periodic orbits such as period - 1 and period - 2 . meanwhile , the periodic control methodology is utilized , the higher periods such as period - 4 can also be successfully directed to expected periodic orbits 该控制方法无需了解系统的动态特性和精确的数学模型,也不需监督学习所要求的训练数据,通过增强学习训练方式,采用改进遗传算法优化神经网络权系数,使之成为混沌控制器,便可产生控制混沌系统的时间序列小扰动信号,仿真实验结果表明它不仅能有效镇定混沌周期1 、 2等低周期轨道,而且在周期控制技术基础上,也可成功将高周期混沌轨道(如周期4轨道)变成期望周期行为。 |
| 10. | L3ased on the organization rules of internet data , the distribution laws of hyperlinks and the name rules of url , a algorithm of tvm rebuilding is established , and satisfactory experiment results are obtained by applying this algorithm . furthermore , efforts are made by applying of tvm on browse navigation , web page classification and reinforcement learning algorithm 结合互联网资源的构建规则、链接分布规律和url命名规则,论文提出了树藤共生数据模型的重建算法,实验结果验证了树藤共生模型的有效性与合理性,在此基础上初步讨论了树藤共生模型在浏览导航、网页分类和reinforcementlearning算法中的应用。 |